RT Network Analysis - Metric Exploration

Author

Harshini Karthikeyan, Alex Ralston, Jack Colt, Chelsey Harper

Code
library(tidyverse)
library(purrr)
library(lubridate)
library(kableExtra)
devtools::load_all("../")

Variants of Betweenness

Standard Betweenness Centrality

What is betweenness centrality?
The sum of probabilities of passing through a given node on the shortest path between two others.

What it describes in the network:
Betweenness highlights individuals who facilitate direct and indirect interactions between nodes. It captures a quality of bridging two nodes, but not necessarily bridging between two clusters or political factions, in our case.

Why does this fall short when considering our research questions?
Betweenness needs to be modified to capture the moderating behavior between Government and Opposition. It’s also not a direct indication of local bridging behavior, really more of a global view. Standard betweenness doesn’t take weights into account; also assumes shortest paths are the relevant paths.

Formula: \[ C_B(v) = \sum_{s \neq v \neq t}\frac{\sigma_{st}(v)}{\sigma_{st}}\]

Where:
- \(\sigma_{st}\): Total number of shortest paths from node \(s\) to node \(t\).
- \(\sigma_{st}(v)\): Number of shortest paths from node \(s\) to \(t\) that pass through \(v\).

Candidate Variant #1 - Cross-Betweenness

What are we altering?
Instead of considering the shortest path between every pair of nodes, we only use distinct pairs of nodes with opposing affiliation (Government and Opposition).

Here we only evaluate the ability of a node to bridge the gap between the factions.

We ignore connectivity within each faction, since this is not relevant to our research.

Formula: \[ C_B(v) = \sum_{o \neq v \neq g}\frac{\sigma_{og}(v)}{\sigma_{og}}\]

Where:
- \(\sigma_{og}\): Total number of shortest paths from node \(o\), an opposition node, to node \(g\), a government node.
- \(\sigma_{og}(v)\): Number of shortest paths from node \(o\) to \(g\) that pass through \(v\).

Concerns / Potential Pitfalls:
Large, dense clusters can skew the centrality, inflating the scores of nodes because they lie on multiple paths within the same cluster. This could mask who is genuinely “important” as a moderator.
Perhaps indirect interactions are not as important as direct interactions, in which case the global nature of this method doesn’t help us in our research.

Exploring this Variant (Cross-Betweenness)

Below we’ll share a glimpse at the top 20 highest cross-betweenness results. The results are currently calculated for a single year. This is only because our function needs to be optimized (happening soon).

The highest cross-betweenness results are exhibited by Walerian Pańko in 1982 and Kazimierz Obsadny in 1987.

Code
# loading results of candidate variant #1 "Cross Betweenness"
cb_by_year <- read_csv("cb_initial_results.csv")

cb_by_year |> arrange(desc(CrossBetweenness)) |> 
  slice_max(order_by = CrossBetweenness, n = 20) |> 
  kable(format = "html", caption = "Highest 20 Cross-Betweenness Results") |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
Highest 20 Cross-Betweenness Results
Member.ID CrossBetweenness Start.Date End.Date Full.Name RT.Affiliation
MEM0142 14161.798 1982-01-01 1982-12-31 Walerian Pańko Opposition
MEM0272 12024.596 1987-01-01 1987-12-31 Kazimierz Obsadny Government
MEM0229 5354.350 1985-01-01 1985-12-31 Andrzej Ziabicki Expert
MEM0247 4597.045 1989-01-01 1989-12-31 Alfred Miodowicz Government
MEM0229 4252.025 1983-01-01 1983-12-31 Andrzej Ziabicki Expert
MEM0229 4196.185 1984-01-01 1984-12-31 Andrzej Ziabicki Expert
MEM0230 4154.471 1989-01-01 1989-12-31 Tadeusz Zieliński Opposition
MEM0230 4036.975 1988-01-01 1988-12-31 Tadeusz Zieliński Opposition
MEM0247 4028.795 1988-01-01 1988-12-31 Alfred Miodowicz Government
MEM0247 3922.914 1986-01-01 1986-12-31 Alfred Miodowicz Government
MEM0247 3381.670 1984-01-01 1984-12-31 Alfred Miodowicz Government
MEM0247 3283.586 1985-01-01 1985-12-31 Alfred Miodowicz Government
MEM0252 3211.582 1986-01-01 1986-12-31 Władysław Siła-Nowicki Government
MEM0084 3137.443 1989-01-01 1989-12-31 Stefan Jurczak Opposition
MEM0252 3098.462 1987-01-01 1987-12-31 Władysław Siła-Nowicki Government
MEM0084 3068.911 1986-01-01 1986-12-31 Stefan Jurczak Opposition
MEM0115 3043.500 1979-01-01 1979-12-31 Wojciech Lamentowicz Opposition
MEM0272 2991.712 1986-01-01 1986-12-31 Kazimierz Obsadny Government
MEM0115 2888.576 1978-01-01 1978-12-31 Wojciech Lamentowicz Opposition
MEM0272 2785.655 1989-01-01 1989-12-31 Kazimierz Obsadny Government

When we compare Jacek Kuroń and Lech Wałęsa as Dr. Bodwin suggested, we see their cross-betweenness plots exhibit the predicted inversion.

Code
# Comparing Walesa and Kuron as suggested by Bodwin

cb_by_year |> 
  filter(Full.Name %in% c("Jacek Kuroń", "Lech Wałęsa")) |> 
    ggplot(mapping = aes(x = Start.Date,
                       y = CrossBetweenness,
                       group = Member.ID,
                       color = Full.Name)) +
  geom_line(linewidth = .75) +
  geom_point() +
  labs(title = 'Examining Cross-Betweenness in Kuroń and Wałęsa',
       x = 'Year', 
       y = 'Cross-Betweenness',
       color = '') +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = 14),
        legend.text = element_text(size = 12))

Now, we find the individuals with the highest cross-betweenness on average. Plotting their scores, we see consistently high cb from Andrzej Ziabicki and Jan Waleczek, but Stefan Jurczak and Tadeusz Zieliński really spike in the later years.

Code
# I would like to see what individuals have the highest cross-betweenness on average
# filter the data down to include only these
# then visualize

top_5 <- cb_by_year |>
  group_by(Member.ID) |>
  summarise(avg_cb = mean(CrossBetweenness),
            Full.Name = first(Full.Name),
            RT.Affiliation = first(RT.Affiliation)) |>
  arrange(desc(avg_cb)) |>
  slice_max(order_by = avg_cb, n = 5) 

top_5 |>
  kable(format = "html", caption = "Highest 5 Average Cross-Betweenness Over Time") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
Highest 5 Average Cross-Betweenness Over Time
Member.ID avg_cb Full.Name RT.Affiliation
MEM0272 1422.9342 Kazimierz Obsadny Government
MEM0229 1155.6874 Andrzej Ziabicki Expert
MEM0230 1059.0505 Tadeusz Zieliński Opposition
MEM0084 1025.1236 Stefan Jurczak Opposition
MEM0462 923.2615 Jan Waleczek Government
Code
cb_top_5_plot <- cb_by_year |> 
  semi_join(top_5, by = 'Member.ID') 

cb_top_5_plot |>
  ggplot(aes(x = Start.Date, y = CrossBetweenness, color = Full.Name)) +
  geom_line(linewidth = 0.75) +
  # geom_point() +
  theme_bw() +
  labs(title = "Cross-Betweenness For Highest Average CB Individuals",
       x = "Year",
       y = "Cross-Betweenness",
       color = "") +
   theme(plot.title = element_text(face = "bold", size = 18),
        legend.text = element_text(size = 12))

Here we look at 5 experts with highest average cross-betweenness scores.

Code
top_5_experts <- cb_by_year |>
  filter(RT.Affiliation == c("Expert")) |> 
  group_by(Member.ID) |>
  summarise(avg_cb = mean(CrossBetweenness),
            Full.Name = first(Full.Name)) |>
  arrange(desc(avg_cb)) |>
  slice_max(order_by = avg_cb, n = 5) 

cb_top_5_expert_plot <- cb_by_year |> 
  semi_join(top_5_experts, by = 'Member.ID') 

top_5_experts |>
  kable(format = "html", caption = "Highest Average Cross-Betweenness Experts Over Time") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
Highest Average Cross-Betweenness Experts Over Time
Member.ID avg_cb Full.Name
MEM0229 1155.6874 Andrzej Ziabicki
MEM0139 820.8965 Edward Olszewski
MEM0293 428.7067 Adam Lipowski
MEM0211 190.4215 Jerzy Wertenstein-Żuławski
MEM0572 177.3812 Maciej Szumowski
Code
cb_top_5_expert_plot |>
  ggplot(aes(x = Start.Date, y = CrossBetweenness, color = Full.Name)) +
  geom_line(linewidth = 0.75) +
  # geom_point() +
  theme_bw() +
  labs(title = "Cross-Betweenness For Highest Average CB Experts",
       x = "Year",
       y = "Cross-Betweenness",
       color = "") +
   theme(plot.title = element_text(face = "bold", size = 18),
        legend.text = element_text(size = 12))

Code
# Comparing Pańko and Obsadny

cb_by_year |> 
  filter(Full.Name %in% c("Walerian Pańko", "Kazimierz Obsadny")) |> 
    ggplot(mapping = aes(x = Start.Date,
                       y = CrossBetweenness,
                       group = Member.ID,
                       color = Full.Name)) +
  geom_line(linewidth = .75) +
  geom_point() +
  labs(title = 'Examining Cross-Betweenness in Pańko and Obsadny',
       x = 'Year', 
       y = 'Cross-Betweenness',
       color = '') +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = 14),
        legend.text = element_text(size = 12))

These high spikes definitely pique our interest. Pańko’s spike in 1982 corresponds with when he left PZPR. We took a look at the network app around this time and saw behavior in the graph that seems to validate our metric calculation.

Walerian Pańko – 1979

Pańko 1979
Walerian Pańko – 1980

Pańko 1980
Walerian Pańko – 1981

Pańko 1981
Walerian Pańko – 1983

Pańko 1983

Dr. Domber also suggested taking a look at two individuals that may be good examples of potential moderation Wladyslaw Sila-Nowick and Wojciech Lamentowicz.

Code
# Comparing Wladyslaw Sila-Nowicki and Wojciech Lamentowicz

cb_by_year |> 
  filter(Member.ID %in% c("MEM0252", "MEM0115")) |> 
    ggplot(mapping = aes(x = Start.Date,
                       y = CrossBetweenness,
                       group = Member.ID,
                       color = Full.Name)) +
  geom_line(linewidth = .75) +
  geom_point() +
  labs(title = 'Examining Cross-Betweenness in Sila-Nowicki and Lamentowicz',
       x = 'Year', 
       y = 'Cross-Betweenness',
       color = '') +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = 14),
        legend.text = element_text(size = 12))

Candidate Variant #2

What are we altering?
We still consider the shortest paths between pairs of nodes of opposing factions, but to combat the score inflation from large organizations, we introduce a normalizing factor.
We divide by the product of cluster sizes of each of the target nodes.

Formula: \[ C_B(v) = \sum_{o \neq v \neq g}\frac{\sigma_{og}(v)}{\sigma_{og}} \cdot \frac{1}{|C_o| \cdot |C_g|}\]

Where:
- \(\sigma_{og}\): Total number of shortest paths from node \(o\), an opposition node, to node \(g\), a government node.
- \(\sigma_{og}(v)\): Number of shortest paths from node \(o\) to \(g\) that pass through \(v\).
- \(|C_o|\): Size of the cluster containing node \(o\).
- \(|C_g|\): Size of the cluster containing node \(g\).

Concerns / Potential Pitfalls:
We are still considering indirect interactions here, which may or may not be appropriate.
While mitigating against large clusters skewing the results, we may be giving undue influence to smaller clusters. Perhaps adjusting the normalization by some factor could help.

Up Next

  • variant that adjusts the normalization factor

  • variant that explores a decay factor to limit how much indirect interactions contribute.

  • variant that considers only direct paths.

  • variants of other standard metrics - like eigen centrality

  • exploring ratio idea